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Observing  Systems  Simulation  Experiments: 

Their  Role  in  Meteorology 


1.  INTRODUCTION 

This  report  deals  with  a  particular  class  of  simulation  experiments - those 

designed  to  evaluate  the  use  of  data  from  a  given  observing  system  in  numerical 
weather  analysis  and  forecasting.  Simulation  of  data  is  an  attractive  option  when 
evaluating  a  proposed  observing  system  for  which  no  real  data  are  yet  available,  or 
when  the  experiment  requires  reference  to  atmospheric  observations  that  can  be 
considered  perfect. 

It  is  inherent  in  atmospheric  observing  systems  that  their  design  involves  sacrifice 
and  compromise;  we  cannot  observe  the  behavior  of  every  molecule.  Cost  is  usually 
the  primary  limiting  factor.  While  it  may  be  desirable  to  have  another  shipboard 
radiosonde  station  or  another  satellite,  budgets  require  that  some  other  observational 
element  be  eliminated  to  make  such  additions.  Simulation  experiments  provide  an 
educated  basis  on  which  to  evaluate  the  trade-offs. 

The  planning  of  the  Global  Atmospheric  Research  Program  (GARP)  provided  the 
initial  impetus  for  use  of  observing  systems  simulation  experiments  (OSSEs).  The 
U.S.  Committee  for  GARP  (1969)  proposed  a  national  effort  in  OSSE-based  research 
as  an  aid  in  designing  a  global  observing  system.  Ambitious  requirements  had  been 
set  regarding  the  accuracy  with  which  the  value  of  each  atmospheric  parameter 
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should  be  either  measured  or  inferred.  Many  OSSEs  were  conducted  to  help  the 
planners  decide  on  the  best  strategy  to  meet  those  lequirements,  given  the  limited 
resources  of  the  program,  studies  evaluated  trade-offs  between  system  components, 
such  as  polar-orbiting  versus  geostationary  satellites  (Jastrow  and  Halem,  1970)  and 
wind  versus  temperature  observations  (Williamson  and  Kasahara,  1971).  Estimates 
were  made  of  the  relationship  between  the  error  limits  specified  by  GARP  and  the 
range  and  accuracy  of  the  forecasts  that  could  be  derived  from  GARP-quality  data 
(Jastrow  and  Halem,  1970).  In  particular,  planners  wanted  to  know  the  required 
accuracy,  density,  and  frequency  of  observations  (Kasahara,  1972). 

The  global  observing  system  was  implemented  in  the  First  GARP  Global 
Experiment  (FGGE).  The  size  and  diversity  of  the  FGGE  data  set  was 
unprecedented  and,  thus,  the  use  of  the  data  in  numerical  weather  prediction 
presented  new  problems.  For  example,  satellite-based  sounders  were  new,  and  the 
data  they  produced  had  different  error  characteristics  than  the  familiar  radiosonde 
products.  Advances  were  needed  in  the  technologies  of  objective  analysis  and 
assimilation.  Simulations  allowed  researchers  to  begin  testing  methods  jfor  example, 
synoptic  versus  four-dimensional  assimilation;  Jastrow  and  Halem,  1973)  before  the 
FGGE  data  were  available.  This  type  of  study  has  remained  relevant  into  the  1980’s 
as  new  remote  sensing  systems  have  been  proposed  (for  exampie,  Kuo,  et  al.,  i 987 ) . 

Another  FGGE-inspired  purpose  for  OSSEs  was  to  check  the  consistency  of 
observational  system  requirements.  There  was  concern  that  the  FGGE  requirements 
for  wind  data  were  too  lenient  relative  to  the  temperature  requirements,  and  that  the 
inconsistency  would  lead  to  a  misappropriation  of  resources  (Jastrow  and  Halem, 
1970).  Ainoid  and  Dey  (1986)  recommended  that  this  kind  of  consistency  check  be 
included  in  the  design  of  satellite  instruments.  For  example,  a  satellite  instrument 
designer  may  have  to  compromise  between  ground  resolution  and  noise  amplitudes. 
If  the  satellite  data  are  to  be  used  in  a  numerical  model,  the  compromise  should  be 
made  in  light  of  the  model’s  response  to  these  variables. 


2.  OSSE  DESIGNS 

There  is  considerable  variety  among  the  OSSE  designs  that  have  been  employed, 
but  the  basic  steps  are  as  follows:  First,  a  “reference  atmosphere”  is  defined  by 
integrating  a  numerical  model,  and  a  history  of  this  atmosphere  (its  temperatures, 
winds,  etc.)  is  archived.  Second,  simulated  observations  of  the  reference  atmosphere 
are  made  by  taking  history  data  at  selected  locations  and  times  and  adding  “error” 
perturbations.  The  observing  system  characteristics  are  accounted  for  in  this  process. 
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Third,  the  observed  data  are  assimilated  into  another  numerical  model  analysis  and 
(possibly)  forecast  cycle.  Fourth,  the  results  of  the  second  modeling  exercise  are 
compared  with  those  of  the  reference  integration.  The  differences  are  assumed  to  be 
similar  to  the  errors  that  would  occur  if  the  real  observing  system  were  used  in 
modeling  the  real  atmosphere. 

The  components  of  an  OSSE  are  diagramed  in  Figure  1,  which  illustrates  both  the 
processes  and  the  products  that  are  involved  in  an  experiment.  The  “truth”  or 
“nature”  model  run  produces  the  reference  atmospheric  data.  If  a  general  circulation 
model  (GCM)  is  used  in  the  experiment,  the  duration  of  the  run  is  on  the  order  of 
several  weeks  and  the  initial  condition  (A,  in  Figure  1)  may  be  the  product  of  a 
multi-week  spinup  from  a  static,  uniform  atmosphere.  If  a  forecast  model  is  used,  the 
run  duration  is  on  the  order  of  hours  or  days.  The  resulting  history  data  perfectly 
represent  one  four-dimensional  atmospheric  state  that  could  occur  if  the  atmosphere 
were  actually  governed  bv  the  model  equations  (U.S.  Committee  for  GARP,  i?69). 
Thus,  the  data  are  dynamicallv  consistent  with  each  other,  and  they  can  be  available 
at  whatever  spatial  and  temporal  resolution  may  be  needed  for  simulating 
observations  or  verifying  forecasts.  These  requirements  could  not  be  met  by  using 
analyses  of  real  data  to  specify  the  reference  atmosphere. 

The  perturbations  to  the  history  data  virtually  always  include  a  random- 
component,  and  sometimes  they  include  a  systematic  component.  Random 
perturbations  should  account  for  noise  that  arises  in  the  collection  and  processing  of 
data  and  for  errors  that  result  when  sub-grid  scale  weather  features  make 
observations  unrepresentative  of  grid-volume  averages.  Systematic  errors  may  stem 
from  instrument  miscalibration  or  a  biased  response  of  the  observing  system  to 
particular  atmospheric  conditions.  One  example  is  a  cool  bias  in  atmospheric 
temperature  data  when  retrievals  from  infrared  satellite  sounders  are  contaminated 
by  cloud  effects.  This  type  of  error  may  be  systematic  with  respect  to  both  horizontal 
and  vertical  orientations.  Other  vertically  systematic  errors  occur  when  the  vertical 
resolution  of  a  sounding  system  is  deficient  and  smoothing  results. 

Objective  analysis  and  initialization  are  the  means  by  which  simulated 
observations  are  assimilated  into  an  experimental  analysis/forecast  cycle.  The 

simulated  observations  generally  are  incomplete - not  every  parameter  is  specified 

at  every  model  grid  point.  Therefore,  the  simulated  assimilation  process  depends  in 
part  on  a  set  of  initial  data  (B,  in  Figure  1).  For  realism,  condition  B  should  be 
substantially  different  from  condition  A.  It  might  be  some  arbitrary  atmospheric 
state,  or  it  might  be  based  on  data  from  a  source  other  than  the  observing  system  of 
interest. 

In  early  OSSEs  it  was  customary  to  use  the  same  numerical  model  to  create  the 
reference  atmosphere  and  to  conduct  the  simulated  analysis/forecast  cycle.  These 
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became  known  as  "identical-twin"  experiments.  Williamson  and  Knsaharn  (1971) 
suggested  that  estimates  of  analysis/foreeast  errors  would  he  more  realistic  if  a  highly 
sophisticated  model  were  used  to  create  the  reference  atmosphere  and  a  simpler  one 
made  the  analysis/forecast,  given  that  the  real  atmosphere  is  more  complicated  than 
any  numerical  model.  Modeling  errors  stem  from  inaccurate  treatment  of  physical 
processes  and  from  computational  error  and  thus,  they  suggested  that  the  forecast 
model  have  degraded  resolution  and  physical  pararneterizat ions.  This  is  commonly 
referred  to  as  a  “fraternal  twin  approach. 

Definitions  of  the  term  0 S S T  generally  emphasize  evaluation  of  weather  forecasts, 
hut  many  OSSK  studies  have  focused  on  weather  analyses,  without  making  forecasts. 
The  two  possible  foci  art'  represented  by  the  clashed  lines  in  f  igure  I.  Thu'  distinction 
is  not  always  clear  because  OSSK  analyses  are  done  on  a  model  grid  as  the  basis  for 
numerical  forecasts,  and  some  (four-dimensional)  assimilation  methods  involve  model 
integration.  Analysis-oriented  experiments  are  relatively  direct,  since  analysis  errors 
depend  only  on  the  data  and  the  assimilation  system,  which  OSSKs  are  intended  to 
evaluate.  Forecasts  indicate  tin*  ultimate  effect  of  analysis  errors,  but  forecast  error 
statistics  also  depend  on  t he  sophistication  of  the  forecast  model  relative  to  tin 
reference  atmosphere. 


3.  APPLICATIONS 

(’barney,  r.t  nl.  (1969)  pioneered  the  use  of  “induction  experiments  in  relation  to 
CARP.  The  object  of  induction  OSSKs  is  to  test  the  accuracy  with  which  one 
meteorological  variable  can  be  induced  in  a  model  by  continuously  inserting  data 
from  “observations"  of  another  variable.  In  particular,  ('barney  cl  <\l.  explored  the 
possibility  of  using  a  long  time  sequence  of  satellite-derived  temperature-  soundings  to 
induce  winds,  making  wind  observations  unnecessary.  A  COM  was  used  in  an 
identical-t  win  design.  Their  simulated  o!i:k  rvutior.s  consisted  of  atmosuheric 
temperature  data  covering  a  60-day  history  of  the  reference  atmosphere.  The  data 
were  used  in  four-dimensional  assimilation  experiments,  with  data  insertion  intervals 
ranging  from  1  to  21  hours.  In  these  experiments  the  wind  “error.-.”  were  large 
initially,  but  their  average  values  decreased  asymptotically  over  the  60-dav  period  in 
response  to  the  repeated  correction  of  the  mass  field.  An  insertion  interval  of  12 
hours  produced  the  smallest  asymptotic  wind  error. 

This  work  was  extended  by  several  studies.  .Jaslrow  and  Halem  (1970)  considered 
the  effects  of  varying  coverage  of  satellite  data.  Their  simulated  data  corresponded 
to  realistic  satellite  orbits  arid  scan  patterns.  A  pair  of  polar  orbiters  could  provide 
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sufficient  data  for  .'.’.id  induction,  fun  geostationary  data  alone  were  insufficient.  A 
combination  of  ‘  Lc  two  was  better  than  either  nlotu.  \‘v  illi.ittt^on  and  kasahara 
(1071)  in ve.i lgated  the  converse  process  of  inducing  temperatures  by  insertion  of 
wind  data.  They  experimented  with  varying  insertion  frequencies  and  related  the 
asymptotic  error  levels  to  noise  magnitudes  in  the  observations.  Krror  levels  were 
found  to  be  a  funriion  of  latitude,  implying  that  observation  strategies  should  no!  be 
the  same  all  around  the  globe  Kasahara  and  Williamson  (11)72)  attempted  to 
determine  what  minimum  combination  of  observations  would  sat  tsf\  OARR 
requirements  evorvwhero  and  found  that  the  preferred  strategy  was  to  observe  winds 
in  the  tropics  and  temperatures  in  higher  latitudes. 

A  distinction  of  Kasahara  and  Williamson  s  (1972)  work  was  that  they  considered 
the  effects  of  systematic  observation  errors,  finding  'hem  to  be  more  detrimental  than 
purely  random  errors  with  the  same  root-mean-square  value.  However,  the  error 
distribution  they  used  (global,  symmetric  about  the  poles)  was  not  realistic  for 
satellite- based  soundings. 

.last row  and  Halem  (1970)  applied  their  OSSK  results  to  the  question  of  internal 
consistency  of  OARR  data  requirements.  They  observed  that,  if  the  (.ARP 
requirement  for  temperature  data  were  met.  winds  could  be  induced  from 
temperatures  with  ari  accuracy  exceeding  the  OARR  wind  data  requirements.  This 
conclusion  was  reinforced  by  Williamson  and  Kasahara  s  (1971)  study. 

All  of  the  early  (JARP-related  OSSKs  were  identical-twin  experiments.  Results  of 
later  studies  suggesud  that  the  early  results  had  been  interpreted  too  liberally. 
Williamson  (1972)  performed  fraternal-twin  experiments  in  which  the  resolution  of 
flic  ( i  (  '  ,\ !  that  created  the  reference  atmosphere  was  finer  than  the  one  used  lor 
induction.  Asymptotic  error  values  for  induced  winds  and  temperatures  were  much 
larger  with  this  approach  than  with  identical-twin  experiments.  Results  also 
depended  on  the  particular  model  used  in  the  study,  dastrow  and  Halem  (1972) 
showed  that  contradictory  results  of  three  Independent  OSSKs  appeared  to  stem  from 
differences  in  model  resolution.  ( 'omparisons  such  as  these  were  helpful  in  designing 
later  OSSKs  and  in  interpreting  their  results. 

In  the  19*0’s  OSSKs  have  dealt  with  a  variety  of  topics.  There  has  been 
considerable  interest  in  a  proposed  s  iteibte-based  lidar  for  wind  measurement  around 
the  globe,  which  might  be  expected  to  greatly  enhance  prediction  skill  in  regions  with 
few  rawinsondes.  Rlouhy  and  Halem  (19S-1)  helped  to  iciate  possible’  limitations  in 
lidar  data  coverage  to  the  usefulness  of  the  data  in  numerical  forecasting. 

OSSKs  have  been  used  to  study  existing  observing  systems  as  well  as  proposed 
ones.  Daley  and  Meyer  (19Sf>)  developed  an  OSSK  procedure’  to  estimate  the  error  in 
our  current  global  analyses  as  a  function  of  height  and  latitude.  Rv  relying  on  a 


simulation  approach  they  had  perfect  reference  data  against  which  to  compare 
analysis  results.  1  hey  listed  three  alternative  experimental  methods,  all  u-ing  teal 
data,  that  could  have  been  used  for  their  study.  However,  they  concluded  that  Hu 
OSSK  method  was  preferable  because  the  assumptions  and  inferences  on  which  tin 
alternatives  depended  were  less  reliable  than  those  involved  in  an  OSSK. 

The  applications  mentioned  above  were  all  global-scale  studies,  and  most  of  them 
employed  GCMs.  In  recent  years  there  has  been  heightened  interest  in  mesoscale 
observing  systems,  and  several  mesoscale  OSSRs  have  been  done.  These-  have 
generally  been  identical-twin  experiments  with  a  model  that  has  a  regional  domain. 
I  he  objective  of  Kuo  and  Anthes  (1981)  study  was  similar  to  that  of  Dales  arid 
Meyer  (198b)  in  that  they  evaluated  tin  observing  system  that  had  already  been 
implemented.  In  particular,  they  investigated  the  accuracy  with  which  heat  and 
moisture  budgets  could  be  computed  from  A VE- SRSAMR  data.  They  estimated  the 
magnitudes  of  errors  from  specific  sources  and  drew  inferences  regarding  the  design  ol 
future  special-purpose  observing  systems. 

Kuo.  e /  al.  (198b)  evaluated  the  accuracy  of  trajectory  models  used  in  studying 
pollutant  dispersion.  Their  results  indicated  that  'the  current  synoptic  network  and 
observational  frequency  over  North  America  are  inadequate  for  accurate  computation 
of  long-range  transport  of  episodic  events".  They  concluded  that  it  would  be  mon- 
cost  effective  to  increase  the  observational  frequency  than  to  enhance  the  spatial 
resolution  of  the  existing  network.  Analysis  methods  were  evaluated  to  determine  the 
effect  using  optional  simplifying  assumptions. 

Mesoscale  OSSRs  have  included  induction  experiments.  Kuo  and  Anthes  M9Vn 
were  looking  toward  a  proposed  network  of  ground-based  remote  wind  profilers  when 
they  tested  a  method  for  inferring  the  mesoscale  temperature  distribution  from 
nearly-cont  inuous  wind  observations.  Possible  network  configurations  were 
valuated,  and  options  were  tested  regarding  the  combination  of  profiler  data  with 
other  types  of  data  (Kuo,  tt  al..  1987).  The  latter  study  concluded  that  both  wind 
and  temperature  information  are  needed  to  produce  good  forecasts  at  the  mesoscale. 

An  OSSK  provided  the  first  testing  ground  for  the  (Jal-('hcn.  rt  al.  |19sf. ) 
assimilation  method  for  mesoscale  satellite-based  sounding  data.  'They  found  that 
forecasts  could  benefit  from  increasing  the  frequency  of  geostationary  sounding 
observations  such  that  data  would  he  taken  hourly.  They  also  assessed  the 
importance  of  gaps  that  arise  in  retrievals  from  infrared  sounder  data  when  cloud* 
arc  present . 

Oceanographers  have  used  OSSKs  with  approaches  similar  to  those  of  mesoscale 
meteorologists.  A  group  of  researchers  from  the  Naval  Ocean  Research  and 
Development  Activity  (NORI)A)  studied  the  potential  use  of  a  satellite-based  sea 


surface  altimeter  in  a  series  of  OSSEs  with  a  numerical  model  covering  the  Gulf  of 
Mexico  (Hurlburt,  1 986 •  Thompson,  1986;  Kmdle,  1986).  The  focus  issues  in  their 
work  were:  1)  inference  of  subsurface  information  from  surface  data,  2)  spatial  and 
temporal  sampling  requirements,  3)  the  feasibility  of  ?  synoptic  data  assimilation,  and 
4)  evaluation  of  the  impacts  of  uncertainty  in  the  data.  Both  identical-  and 
fraternal-twin  experiments  were  used. 


4.  i. IMITATIONS 

OSSEs  are  inherently  complicated.  There  are  several  major  steps  in  the  process 
and  each  involves  assumptions  and  uncertainties.  Kasahara  (1972)  pointed  out  that 
this  makes  OSSE  results  difficult  to  interpret.  For  example,  the  peculiai ities  of  an 
analysis  system  may  either  enhance  or  detract  from  the  apparent  value  of  an 
observing  system  as  applied  to  modeling.  This  type  of  problem  also  occurs  (but  to  a 
lesser  degree)  in  experiments  that  use  real  data  instead  of  simulated  data  (Tracton,  et 
at.,  1981;  Atlas,  et  ai,  1982). 

Sc  c  i  limitations  of  OSSE  studies  are  related  to  the  dependence  of  the  results  on 
the  particular  numerical  model  employed.  At  the  ex^eme,  OSSE  results  can  be  valid 
only  if  the  model  is  sufficiently  similar  to  the  atmosphere  that  it  can  simulate  the 
meteorological  phenomena  of  interest.  For  example,  tropical  observing  systems 
cannot  be  evaluated  with  a  GCM  that  lacks  the  forcing  mechanisms  for  tropical 
convection  (Jastrow  and  Halem,  1970). 

Given  an  adequate  model,  the  limits  on  interpretation  of  Trsults  depend  heavily  on 
how  the  model  is  used  in  the  OSSE.  Identical-twin  experiments  are  particularly 
limited.  Part  of  their  problem  is  the  compatibility  issue  addressed  by  Morel,  et  al. 
(1971).  Data  simulated  from  a  numerical  model  run  are  highly  consistent  with  the 
slow  normal  modes  of  that  model.  If  the  same  model  is  used  for  an  analysis/forecast, 
the  data  should  be  very  readily  assimilated.  If,  on  the  other  hand,  the  data  come 
from  a  system  (for  example,  the  real  atmosphere)  with  different  normal  modes,  the 
data  might  be  poorly  assimilated.  Beneficial  effects  of  the  data  depend  on  thorough 
assimilation.  Thus,  identical-twin  results  may  depend  on  an  unrealistically  good  a 
prion  fit  between  the  data  and  the  mode!  dynamics. 

A  second  limitation  of  identical-twin  experiments  is  that  they  cannot  give  reliable 
estimates  of  real-world  forecast  errors.  Analysis  errors  depend  on  many  factors, 
including  the  quality  of  the  observing  system,  but  the  growth  of  those  errors  during 
an  identical-twin  forecast  run  depends  only  on  the  predictability  of  the  model 
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atmosphere  (Williamson,  1973).  Bv  predictability,  we  mean  the  tendency  for  two 
nearly-identical  initial  states  of  a  model  to  yield  very  different  forecasts  arter  a  long 
integration.  In  reality  the  forecast  error  grows  due  to  model  imperfections  as  well  as 
predictability  limits. 

Fraternal-twin  experiments  can  account  to  some  degree  for  the  imperfec lions  of 
forecast  models  relative  to  the  real  atmosphere.  However,  even  this  experimental 
design  has  limitations  since  the  atmosphere  is  more  different  from  a  model  than  any 
two  models  are  from  each  other.  Forecast  errors  are  likely  to  be  underestimated  since 
there  is  much  in  common  among  the  ways  models  parameterize  the  physics  of  the 
atmosphere  (Jastrow  and  Halem,  1973). 

To  evaluate  an  observing  system  by  simulation,  the  error  characteristics  of  the 
observational  data  must  be  accurately  represented.  Unrealistic  methods  have  been 
used  in  most  OSSEs  to  introduce  error  to  observations  of  the  reference  atmosphere. 
The  conventional  approach  is  to  add  random  and/or  systematic  errors  to  the 
reference  data  according  to  the  expected  behavior  of  the  observing  instrument.  This 
works  poorly  when  the  observations  are  from  remote  sensors.  In  satellite-based 
temperature  soundings,  for  example,  the  vertical  and  horizontal  structure  of  errors  in 
retrieved  temperatures  depends  on  many  meteorological  factors  and  on  the  retrieval 
algorithm.  For  remote  sensors  it  is  more  realistic  to  go  through  the  intermediate 
steps  of  simulating  the  observed  data  (for  example,  radiances)  from  the  history  of  the 
reference  atmosphere  and  then  retrieving  the  meteorological  data  (for  example, 
temperatures).  Atlas,  1 1  al.  (1984)  described  in  detail  how  this  can  be  done.  This 
method  also  has  limitations,  however,  because  radiance  simulation  requires  detailed 

information  about  cloudiness - more  detailed  than  forecast  models  can  provide 

directly.  Imerences  and  assumptions  are  needed. 

The  several  steps  in  conducting  an  OSSE  require  a  large  amount  of  computer  time, 
even  for  relatively  simple  experimental  designs.  Therefore,  researchers  typically  rely 
on  a  single  analysis/forecast  for  each  treatment  in  their  experiments.  This  approach 
yields  less  reliable  results  than  repeated  analysis/forecasts  with  different 
meteorological  conditions  (Arnold  and  Dey,  1986),  which  allows  for  computing 
ensemble  statistics.  This  issue  may  be  particularly  important  for  mesoscale  OSSEs 
because  a  relatively  narrow  range  of  meteorological  conditions  can  occur  within  the 
time  and  space  limits  of  a  single  mesoscale  analysis/forecast. 
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5.  PROSPECTS 


Despite  the  limitations  of  OSSEs,  they  are  a  useful  means  to  evaluate  current  and 
proposed  observing  systems  and  analysis  methods.  In  many  instances  there  is  no 
better  alternative.  One  issue  to  be  considered,  however,  is  the  cost  of  OSSEs  in  terms 
of  human  and  computer  time.  If  the  purpose  is  to  determine  whether  to  install  a 
proposed  observing  system,  and  an  OSSE  would  cost  more  than  the  system,  then  it 
would  be  best  to  skip  the  OSSE  and  go  ahead  with  installation  (Arnold  and  Dey, 
1986). 

When  an  OSSE  is  worth  the  cost,  researchers  must  design  the  experiments 
carefully  and  exercise  great  restraint  in  interpreting  their  results.  The  OSSE's  design 
must  be  logically  related  to  its  purpose  and  objectives,  and  all  these  aspects  of  the 
experiments  are  constrained  by  the  limitations  inherent  to  OSSEs.  For  example,  if  a 
satellite-based  wind-sensing  lidar  is  being  planned,  one  conceivable  purpose  for  an 
OSSE  is  to  learn  the  accuracy  of  forecasts  that  would  result  from  using  the  lidar  data 
in  a  state-of-the-art  forecast  model.  This  ourpose  is  unrealistic  given  the  limitations 
of  OSSEs  discussed  earlier.  Furthermore,  the  fraternal-twin  approach  is  not  an 
available  design  option  for  evaluating  state-of-the-art  models.  A  more  realistic 
purpose  would  be  to  determine  whether  it  is  likely  that  lidar  data  would  have  a 
significant  beneficial  effect  on  forecasts.  In  addition,  OSSEs  can  be  very  useful  for 
intercomparing  forecasts  made  with  varying  amounts  and  qualities  of  lidar  data. 

It  is  possible  to  draw  valid  conclusions  about  an  observing  system  only  if  the 
observed  data  are  realistically  simulated.  For  remote  sensors  it  will  generally  be 
necessary  to  make  retrieval  of  meteorological  parameter  values  a  part  of  the  OSSE 
procedure.  Once  meteorological  data  are  simulated  at  the  observation  sites,  a 
realistic  method  must  be  used  to  interpolate  the  data  to  the  model  grid. 

The  horizontal  and  vertical  resolutions  of  the  model  must  be  compatible  with  the 
observing  system  being  evaluated.  If  the  resolvable  scales  of  the  model  are  broader 
than  those  of  the  observing  system,  then  some  information  may  be  wasted  and  the 
OSSE  will  not  be  a  fair  test  of  the  system’s  value.  The  resolution  must  also  be 
sufficient  to  simulate  the  relevant  meteorological  phenomena. 

Experimental  designs  generally  should  be  fraternal-twin  rather  than  identical-twin, 
so  that  the  results  can  be  interpreted  most  broadly.  The  relative  simplicity  of 
identical-twin  experiments  make  them  preferable  in  some  situations,  such  as  initial 
testing  of  an  analysis  technique  (for  example,  Gal-Chen,  et  ai,  1986).  When  the 
analysis/forecast  model  can  be  considered  perfect  the  OSSE  results  are  relatively  easy 
to  interpret;  there  are  fewer  possible  sources  for  any  errors  in  the  analysis.  If  a 
method  shows  promise  in  an  identical-twin  experiment,  then  fraternal-twin  and/or 
real-data  experiments  should  be  employed  to  further  evaluate  the  method. 
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Most  OSSE  designs  have  relied  on  simple,  objective  measures,  such  as  root-mean- 
square  errors,  for  evaluation  of  anaKsis/forecast  results.  This  is  understandable 
given  the  large  quantities  of  data  involved.  Interpretations  of  statistics  such  as  these 
should  include  tests  of  significance  (Arnold  and  Dey,  1986).  Whenever  possible,  it  is 
advantageous  to  make  subjective  evaluations  also,  which  may  bring  to  light 
meteorologically  significant  features  of  the  results  that  could  be  hidden  in  simple 
statistics.  It  is  also  helpful  at  times  to  stratify  the  results  (by  latitude,  for  example)  to 
highlight  the  effect  of  the  observing  system  on  a  particular  region  or  under  a  limited 
set  of  meteorological  conditions. 

There  is  reason  to  believe  that  future  meteorological  research  will  include  many 
OSSEs.  There  has  recently  been  strong  interest  in  using  new,  remotely-sensed  data  in 
numerical  models,  and  in  combining  datasets  from  different  sources  within  the 
context  of  a  model  grid.  Certainly  modeling  applications  will  be  a  major 
consideration  for  future  observing  systems.  In  this  regard,  one  major  limitation  of 
OSSEs,  model  dependence,  is  becoming  less  acute.  The  models  available  to 
researchers  are  growing  in  number,  sophistication,  and  variety,  and  the  growth  of 
computer  power  makes  it  possible  to  increase  the  realism  of  many  parts  of  the  OSSE 
process. 
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